Maximum Entropy Probability Distribution
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, a maximum entropy probability distribution has
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
that is at least as great as that of all other members of a specified class of
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. According to the
principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
, if nothing is known about a distribution except that it belongs to a certain class (usually defined in terms of specified properties or measures), then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.


Definition of entropy and differential entropy

If X is a
discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with distribution given by :\operatorname(X=x_k) = p_k \quad\mbox k=1,2,\ldots then the entropy of X is defined as :H(X) = - \sum_p_k\log p_k . If X is a
continuous random variable In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
with
probability density In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
p(x), then the
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
of X is defined as :H(X) = - \int_^\infty p(x)\log p(x)\, dx. The quantity p(x)\log p(x) is understood to be zero whenever p(x) = 0. This is a special case of more general forms described in the articles
Entropy (information theory) In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
,
Principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
, and differential entropy. In connection with maximum entropy distributions, this is the only one needed, because maximizing H(X) will also maximize the more general forms. The base of the
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 o ...
is not important as long as the same one is used consistently: change of base merely results in a rescaling of the entropy. Information theorists may prefer to use base 2 in order to express the entropy in
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s; mathematicians and physicists will often prefer the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
, resulting in a unit of
nat Nat or NAT may refer to: Computing * Network address translation (NAT), in computer networking Organizations * National Actors Theatre, New York City, U.S. * National AIDS trust, a British charity * National Archives of Thailand * National As ...
s for the entropy. The choice of the measure dx is however crucial in determining the entropy and the resulting maximum entropy distribution, even though the usual recourse to the
Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...
is often defended as "natural".


Distributions with measured constants

Many statistical distributions of applicable interest are those for which the moments or other measurable quantities are constrained to be constants. The following theorem by
Ludwig Boltzmann Ludwig Eduard Boltzmann (; 20 February 1844 – 5 September 1906) was an Austrian physicist and philosopher. His greatest achievements were the development of statistical mechanics, and the statistical explanation of the second law of thermodyn ...
gives the form of the probability density under these constraints.


Continuous case

Suppose S is a
closed subset In geometry, topology, and related branches of mathematics, a closed set is a set whose complement is an open set. In a topological space, a closed set can be defined as a set which contains all its limit points. In a complete metric space, a clo ...
of the
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
s \mathbb and we choose to specify n
measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in di ...
s f_1, \cdots,f_n and n numbers a_1, \ldots, a_n. We consider the class C of all real-valued random variables which are supported on S (i.e. whose density function is zero outside of S ) and which satisfy the n moment conditions: :\mathbb
_j(X) J, or j, is the tenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its usual name in English is ''jay'' (pronounced ), with a now-uncommon vari ...
\geq a_j\quad\mbox j=1,\ldots,n If there is a member in C whose density function is positive everywhere in S , and if there exists a maximal entropy distribution for C , then its probability density p(x) has the following form: :p(x)=\exp\left(\sum_^n \lambda_j f_j(x)\right)\quad \mbox x\in S where we assume that f_0(x)=1. The constant \lambda_0 and the n
Lagrange multipliers In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ex ...
\boldsymbol\lambda=(\lambda_1,\ldots,\lambda_n) solve the constrained optimization problem with a_0=1 (this condition ensures that p integrates to unity): :\max_ \left\\quad \mathrm \boldsymbol\lambda\geq\mathbf Using the
Karush–Kuhn–Tucker conditions In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be o ...
, it can be shown that the optimization problem has a unique solution because the objective function in the optimization is concave in \boldsymbol\lambda. Note that if the moment conditions are equalities (instead of inequalities), that is, :\mathbb
_j(X) J, or j, is the tenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its usual name in English is ''jay'' (pronounced ), with a now-uncommon vari ...
= a_j\quad\mbox j=1,\ldots,n, then the constraint condition \boldsymbol\lambda\geq\mathbf is dropped, making the optimization over the Lagrange multipliers unconstrained.


Discrete case

Suppose S = \ is a (finite or infinite) discrete subset of the reals and we choose to specify n functions ''f''1,...,''f''''n'' and ''n'' numbers ''a''1,...,''a''''n''. We consider the class ''C'' of all discrete random variables ''X'' which are supported on ''S'' and which satisfy the ''n'' moment conditions :\operatorname(f_j(X)) \geq a_j\quad\mbox j=1,\ldots,n If there exists a member of ''C'' which assigns positive probability to all members of ''S'' and if there exists a maximum entropy distribution for ''C'', then this distribution has the following shape: :\operatorname(X=x_k)=\exp\left(\sum_^n \lambda_j f_j(x_k)\right)\quad \mbox k=1,2,\ldots where we assume that f_0=1 and the constants \lambda_0,\;\boldsymbol\lambda=(\lambda_1,\ldots,\lambda_n) solve the constrained optimization problem with a_0=1: :\max_ \left\\quad\mathrm \boldsymbol\lambda\geq\mathbf Again, if the moment conditions are equalities (instead of inequalities), then the constraint condition \boldsymbol\lambda\geq\mathbf is not present in the optimization.


Proof in the case of equality constraints

In the case of equality constraints, this theorem is proved with the
calculus of variations The calculus of variations (or Variational Calculus) is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions t ...
and
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ex ...
s. The constraints can be written as :\int_^f_j(x)p(x)dx=a_j We consider the
functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional sy ...
:J(p)=\int_^ p(x)\lndx-\eta_0\left(\int_^ p(x)dx-1\right)-\sum_^\lambda_j\left(\int_^ f_j(x)p(x)dx-a_j\right) where \eta_0 and \lambda_j, j\geq 1 are the Lagrange multipliers. The zeroth constraint ensures the second axiom of probability. The other constraints are that the measurements of the function are given constants up to order n. The entropy attains an extremum when the
functional derivative In the calculus of variations, a field of mathematical analysis, the functional derivative (or variational derivative) relates a change in a functional (a functional in this sense is a function that acts on functions) to a change in a function on ...
is equal to zero: :\frac\left(p\right)=\ln+1-\eta_0-\sum_^\lambda_j f_j(x)=0 It is an exercise for the reader that this extremum is indeed a maximum. Therefore, the maximum entropy probability distribution in this case must be of the form (\lambda_0:=\eta_0-1) :p(x)=e^\cdot e^ = \exp\left(\sum_^\lambda_j f_j(x)\right) \; . The proof of the discrete version is essentially the same.


Uniqueness of the maximum

Suppose p, p' are distributions satisfying the expectation-constraints. Letting \alpha\in(0,1) and considering the distribution q=\alpha\cdot p+(1-\alpha)\cdot p' it is clear that this distribution satisfies the expectation-constraints and furthermore has as support \mathrm(q)=\mathrm(p)\cup \mathrm(p'). From basic facts about entropy, it holds that \mathcal(q)\geq \alpha\mathcal(p)+(1-\alpha)\mathcal(p'). Taking limits \alpha\longrightarrow 1 and \alpha\longrightarrow 0 respectively yields \mathcal(q)\geq \mathcal(p),\mathcal(p'). It follows that a distribution satisfying the expectation-constraints and maximising entropy must necessarily have full support — ''i. e.'' the distribution is almost everywhere positive. It follows that the maximising distribution must be an internal point in the space of distributions satisfying the expectation-constraints, that is, it must be a local extreme. Thus it suffices to show that the local extreme is unique, in order to show both that the entropy-maximising distribution is unique (and this also shows that the local extreme is the global maximum). Suppose p,p' are local extremes. Reformulating the above computations these are characterised by parameters \vec,\vec'\in\mathbb^ via p(x)=\frac and similarly for p', where C(\vec)=\int_ e^~dx. We now note a series of identities: Via the satisfaction of the expectation-constraints and utilising gradients/directional derivatives, one has D\log(C(\cdot))\vert_=\left.\frac\_=\mathbb_
vec(X) Vec may mean: Mathematics: * vec(''A''), the vectorization of a matrix ''A''. * Vec denotes the category of vector spaces over the reals. Other: * Venetian language (Vèneto), language code. * Vecuronium, a muscle relaxant. * vec, a sentient mor ...
\vec and similarly for \vec'. Letting u=\vec'-\vec\in\mathbb^ one obtains: : 0=\langle u,\vec-\vec\rangle =D_\log(C(\cdot))\vert_-D_\log(C(\cdot))\vert_ =D_^\log(C(\cdot))\vert_ where \vec=\theta\vec+(1-\theta)\vec' for some \theta\in(0,1). Computing further one has : \begin 0 &= &D_^\log(C(\cdot))\vert_\\ &= &\left.D_\left(\frac\right)\_\\ &= &\left.\frac\_ -\left.\frac\_\\ &= &\mathbb_ \langle u,\vec(X)\rangle)^\left(\mathbb_ langle u,\vec(X)\rangleright)^=\mathrm_(\langle u,\vec(X)\rangle)\\ \end where q is similar to the distribution above, only parameterised by \vec. ''Assuming'' that no non-trivial linear combination of the observables is almost everywhere (a.e.) constant, (which ''e.g.'' holds if the observables are independent and not a.e. constant), it holds that \langle u,\vec(X)\rangle has non-zero variance, unless u=0. By the above equation it is thus clear, that the latter must be the case. Hence \vec'-\vec=u=0, so the parameters characterising the local extrema p,p' are identical, which means that the distributions themselves are identical. Thus, the local extreme is unique and by the above discussion, the maximum is unique—provided a local extreme actually exists.


Caveats

Note that not all classes of distributions contain a maximum entropy distribution. It is possible that a class contain distributions of arbitrarily large entropy (e.g. the class of all continuous distributions on R with mean 0 but arbitrary standard deviation), or that the entropies are bounded above but there is no distribution which attains the maximal entropy.For example, the class of all continuous distributions ''X'' on R with and (see Cover, Ch 12). It is also possible that the expected value restrictions for the class ''C'' force the probability distribution to be zero in certain subsets of ''S''. In that case our theorem doesn't apply, but one can work around this by shrinking the set ''S''.


Examples

Every probability distribution is trivially a maximum entropy probability distribution under the constraint that the distribution has its own entropy. To see this, rewrite the density as p(x)=\exp and compare to the expression of the theorem above. By choosing \ln \rightarrow f(x) to be the measurable function and :\int \exp f(x) dx=-H to be the constant, p(x) is the maximum entropy probability distribution under the constraint :\int p(x)f(x)dx=-H. Nontrivial examples are distributions that are subject to multiple constraints that are different from the assignment of the entropy. These are often found by starting with the same procedure \ln \rightarrow f(x) and finding that f(x) can be separated into parts. A table of examples of maximum entropy distributions is given in Lisman (1972) and Park & Bera (2009).


Uniform and piecewise uniform distributions

The uniform distribution on the interval 'a'',''b''is the maximum entropy distribution among all continuous distributions which are supported in the interval 'a'', ''b'' and thus the probability density is 0 outside of the interval. This uniform density can be related to Laplace's
principle of indifference The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. The principle of indifference states that in the absence of any relevant evidence, agents should distribute their cre ...
, sometimes called the principle of insufficient reason. More generally, if we are given a subdivision ''a''=''a''0 < ''a''1 < ... < ''a''''k'' = ''b'' of the interval 'a'',''b''and probabilities ''p''1,...,''p''''k'' that add up to one, then we can consider the class of all continuous distributions such that :\operatorname(a_\le X < a_j) = p_j \quad \mbox j=1,\ldots,k The density of the maximum entropy distribution for this class is constant on each of the intervals [''a''''j''-1,''a''''j''). The uniform distribution on the finite set (which assigns a probability of 1/''n'' to each of these values) is the maximum entropy distribution among all discrete distributions supported on this set.


Positive and specified mean: the exponential distribution

The
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
, for which the density function is : p(x, \lambda) = \begin \lambda e^ & x \ge 0, \\ 0 & x < 0, \end is the maximum entropy distribution among all continuous distributions supported in ,∞)_that_have_a_specified_mean_of_1/λ.


__Specified_mean_and_variance:_the_normal_distribution_

The_normal_distribution
_N(μ,σ2),_for_which_the_density_function_is : p(x.html" ;"title="normal_distribution.html" ;"title=",∞) that have a specified mean of 1/λ.


Specified mean and variance: the normal distribution

The ,∞)_that_have_a_specified_mean_of_1/λ.


__Specified_mean_and_variance:_the_normal_distribution_

The_normal_distribution
_N(μ,σ2),_for_which_the_density_function_is : p(x">_\mu,_\sigma)_=_\frac_e^, has_maximum_entropy_among_all_real_number.html" ;"title="normal distribution">,∞) that have a specified mean of 1/λ.


Specified mean and variance: the normal distribution

The _\mu,_\sigma)_=_\frac_e^, has_maximum_entropy_among_all_real_number">real-valued_distributions_supported_on_(−∞,∞)_with_a_specified_variance_''σ''2_(a_particular_Moment_(mathematics).html" ;"title="normal distribution N(μ,σ2), for which the density function is : p(x"> \mu, \sigma) = \frac e^, has maximum entropy among all real number">real-valued distributions supported on (−∞,∞) with a specified variance ''σ''2 (a particular Moment (mathematics)">moment). The same is true when the mean ''μ'' and the variance ''σ''2 is specified (the first two moments), since entropy is translation invariant on (−∞,∞). Therefore, the assumption of normality imposes the minimal prior structural constraint beyond these moments. (See the
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
article for a derivation.) In the case of distributions supported on [0,∞), the maximum entropy distribution depends on relationships between the first and second moments. In specific cases, it may be the exponential distribution, or may be another distribution, or may be undefinable.


Discrete distributions with specified mean

Among all the discrete distributions supported on the set with a specified mean μ, the maximum entropy distribution has the following shape: :\operatorname(X=x_k) = Cr^ \quad\mbox k=1,\ldots, n where the positive constants ''C'' and ''r'' can be determined by the requirements that the sum of all the probabilities must be 1 and the expected value must be μ. For example, if a large number ''N'' of dice are thrown, and you are told that the sum of all the shown numbers is ''S''. Based on this information alone, what would be a reasonable assumption for the number of dice showing 1, 2, ..., 6? This is an instance of the situation considered above, with = and μ = ''S''/''N''. Finally, among all the discrete distributions supported on the infinite set \ with mean μ, the maximum entropy distribution has the shape: :\operatorname(X=x_k) = Cr^ \quad\mbox k=1,2,\ldots , where again the constants ''C'' and ''r'' were determined by the requirements that the sum of all the probabilities must be 1 and the expected value must be μ. For example, in the case that ''xk = k'', this gives :C = \frac , \quad\quad r = \frac , such that respective maximum entropy distribution is the geometric distribution.


Circular random variables

For a continuous random variable \theta_i distributed about the unit circle, the Von Mises distribution maximizes the entropy when the real and imaginary parts of the first circular moment are specified or, equivalently, the
circular mean In mathematics and statistics, a circular mean or angular mean is a mean designed for angles and similar cyclic quantities, such as daytimes, and fractional parts of real numbers. This is necessary since most of the usual means may not be appropri ...
and
circular variance Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes (lines through the origin in R''n'') or rotations in R''n''. Mor ...
are specified. When the mean and variance of the angles \theta_i modulo 2\pi are specified, the
wrapped normal distribution In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownia ...
maximizes the entropy.


Maximizer for specified mean, variance and skew

There exists an upper bound on the entropy of continuous random variables on \mathbb R with a specified mean, variance, and skew. However, there is ''no distribution which achieves this upper bound'', because p(x) = c\exp is unbounded when \lambda_3 \neq 0 (see Cover & Thomas (2006: chapter 12)). However, the maximum entropy is -achievable: a distribution's entropy can be arbitrarily close to the upper bound. Start with a normal distribution of the specified mean and variance. To introduce a positive skew, perturb the normal distribution upward by a small amount at a value many larger than the mean. The skewness, being proportional to the third moment, will be affected more than the lower order moments. This is a special case of the general case in which the exponential of any odd-order polynomial in ''x'' will be unbounded on \mathbb R. For example, c e^ will likewise be unbounded on \mathbb R, but when the support is limited to a bounded or semi-bounded interval the upper entropy bound may be achieved (e.g. if ''x'' lies in the interval ,∞and ''λ< 0'', the
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
will result).


Maximizer for specified mean and deviation risk measure

Every distribution with log-concave density is a maximal entropy distribution with specified mean ''μ'' and
Deviation risk measure In financial mathematics, a deviation risk measure is a function to quantify financial risk (and not necessarily downside risk) in a different method than a general risk measure. Deviation risk measures generalize the concept of standard deviation ...
''D''.Grechuk, B., Molyboha, A., Zabarankin, M. (2009
Maximum Entropy Principle with General Deviation Measures
Mathematics of Operations Research 34(2), 445--467, 2009.
In particular, the maximal entropy distribution with specified mean E(x)=\mu and deviation D(x)=d is: *The
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
N(m,d^2), if D(x)=\sqrt is the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
; *The Laplace distribution, if D(x)=E(, x-\mu, ) is the
average absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, m ...
; * The distribution with density of the form f(x)=c \exp(ax+b^2) if D(x)=\sqrt is the standard lower semi-deviation, where -:=\max\, and ''a,b,c'' are constants.


Other examples

In the table below, each listed distribution maximizes the entropy for a particular set of functional constraints listed in the third column, and the constraint that x be included in the support of the probability density, which is listed in the fourth column. Several examples (Bernoulli, geometric, exponential, Laplace, Pareto) listed are trivially true because their associated constraints are equivalent to the assignment of their entropy. They are included anyway because their constraint is related to a common or easily measured quantity. For reference, \Gamma(x) = \int_0^ e^ t^ dt is the
gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
, \psi(x) = \frac \ln\Gamma(x)=\frac is the
digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strictly ...
, B(p,q) = \frac is the
beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^(1 ...
, and is the
Euler-Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural l ...
. The maximum entropy principle can be used to upper bound the entropy of statistical mixtures.


See also

*
Exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
*
Gibbs measure In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. Th ...
*
Partition function (mathematics) The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing ...
*
Maximal Entropy Random Walk Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents ...
- maximizing entropy rate for a graph


Notes


Citations


References

* * F. Nielsen, R. Nock (2017),
MaxEnt upper bounds for the differential entropy of univariate continuous distributions
', IEEE Signal Processing Letters, 24(4), 402-406 * I. J. Taneja (2001),
Generalized Information Measures and Their Applications
'

* Nader Ebrahimi, Ehsan S. Soofi, Refik Soyer (2008), "Multivariate maximum entropy identification, transformation, and dependence", ''
Journal of Multivariate Analysis The ''Journal of Multivariate Analysis'' is a monthly peer-reviewed scientific journal that covers applications and research in the field of multivariate statistical analysis. The journal's scope includes theoretical results as well as applicat ...
'' 99: 1217–1231, {{DEFAULTSORT:Maximum Entropy Probability Distribution Entropy and information Continuous distributions Discrete distributions Particle statistics Types of probability distributions